Skip to content

refactor(viewer): consolidate Qt6 boards onto cage + Wayland#2883

Closed
vpetersson wants to merge 1 commit into
worktree-jiggly-sauteeing-stonebrakerfrom
chore/wayland-consolidate-qt6
Closed

refactor(viewer): consolidate Qt6 boards onto cage + Wayland#2883
vpetersson wants to merge 1 commit into
worktree-jiggly-sauteeing-stonebrakerfrom
chore/wayland-consolidate-qt6

Conversation

@vpetersson
Copy link
Copy Markdown
Contributor

Note: Stacked on top of #2879 (the generic-arm64 PR), so the diff here is the single consolidation commit. Base is worktree-jiggly-sauteeing-stonebraker; rebase onto master once #2879 lands.

Description

Pi4-64, Pi5, x86, and arm64 now all run the viewer under cage (a wlroots kiosk compositor) with Qt on QT_QPA_PLATFORM=wayland and mpv on --vo=gpu --gpu-context=wayland. Previously only x86 and (per #2879) generic arm64 took this path; Pi4-64 and Pi5 ran Qt linuxfb + mpv --vo=drm directly on KMS. Folding all four onto one display stack drops a per-board branch each in image_builder, Dockerfile.viewer, start_viewer.sh, and media_player.py.

Notable per-file changes:

  • tools/image_builder/utils.pycage + qt6-wayland move out of the per-board branch into the shared is_qt6 block (now in every Qt6 image, not just x86/arm64). va-driver-all stays x86-only (no VAAPI on ARM SoCs — Pi uses v4l2-request/m2m, Rockchip/Allwinner/Amlogic use V4L2 M2M too).
  • docker/Dockerfile.viewer.j2ENV QT_QPA_PLATFORM=wayland gated on is_qt6 instead of board in ('x86', 'arm64').
  • bin/start_viewer.shcase "$DEVICE_TYPE" in x86|arm64|pi4-64|pi5) now wraps the viewer in cage on all four boards. The render-GID mirror that previously only mattered on x86 (VAAPI / /dev/dri/renderD128) also applies on Pi — the GL context (V3D render node) needs the same access for --vo=gpu --gpu-context=wayland.
  • src/anthias_viewer/media_player.py--vo=gpu --gpu-context=wayland for every Qt6 board; the previous --drm-mode=1920x1080@60 pin is dropped on Pi4-64/Pi5 (no-op under cage anyway, and GPU scaling handles 4K without the A72/A76 CPU zimg upscale that the pin was working around). --vd-lavc-threads=4 stays.
  • tests/test_media_player.py — assertions updated; new parametrised test covers all four Qt6 device types getting --vo=gpu --gpu-context=wayland.
  • website/data/faq.yaml — two entries that claimed "no Wayland compositor in the stack" are corrected.

Validation

On-device frame-drop comparison on a Pi4 Model B (Debian Trixie, DEVICE_TYPE=pi4-64), 30 s of 1080p H.264 over mpv --hwdec=auto-safe from inside the viewer container:

VO flags Asset 1 drops/30s Asset 2 drops/30s
--vo=drm --drm-mode=1920x1080@60 (current) 59–75 19–20
--vo=gpu --gpu-context=drm (no cage) 3–6 3

--vo=gpu --gpu-context=wayland under cage measured separately on the rebuilt viewer image (results in the PR comments). decoder-frame-drop-count was 0 in every run — the drops are all on the VO side, which is exactly the path the move from CPU zimg upscale to GPU scaling improves.

Checklist

  • I have performed a self-review of my own code.
  • New and existing unit tests pass locally and on CI with my changes.
  • I have done an end-to-end test for Raspberry Pi devices.
  • I have tested my changes for x86 devices.
  • I added a documentation for the changes I have made (when necessary).

Pi4-64, Pi5, x86, and arm64 now all run the viewer under `cage`
(a wlroots kiosk compositor) with mpv on --vo=gpu
--gpu-context=wayland and Qt on QT_QPA_PLATFORM=wayland. Previously
only x86 and (per the in-flight arm64 PR) generic arm64 took this
path; Pi4-64 and Pi5 ran Qt linuxfb + mpv --vo=drm directly on KMS.
Folding them all onto one display stack drops a per-board branch
each in image_builder, Dockerfile.viewer, start_viewer.sh, and
media_player.

--drm-mode=1920x1080@60 is dropped on Pi4-64/Pi5: under cage mpv
doesn't hold DRM master so the flag is a no-op, and the GPU does
the scaling that the CPU zimg upscale at 4K previously couldn't
keep up with. --vd-lavc-threads=4 stays.

cage + qt6-wayland move from the per-board apt extension into the
shared is_qt6 branch in image_builder. va-driver-all stays
x86-only (no VAAPI on ARM). The render-GID mirror in
start_viewer.sh now applies to all four Qt6 boards because the GL
context on Pi also needs /dev/dri/renderD128 access.

FAQ entries that claimed "no Wayland compositor in the stack" are
updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from a team as a code owner May 12, 2026 21:08
@vpetersson vpetersson self-assigned this May 12, 2026
@vpetersson vpetersson requested a review from Copilot May 12, 2026 21:08
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates all Qt6-capable device types (pi4-64, pi5, x86, arm64) onto a single viewer display stack: running the viewer under cage with Qt on QT_QPA_PLATFORM=wayland, and mpv on --vo=gpu --gpu-context=wayland. This removes prior per-board branching (notably the Pi KMS --vo=drm path) and aligns runtime/build configuration, tests, and docs accordingly.

Changes:

  • Move cage + qt6-wayland into the shared Qt6 viewer image dependencies and gate QT_QPA_PLATFORM=wayland on is_qt6.
  • Wrap the viewer in cage for all Qt6 boards in start_viewer.sh, including render-node group mirroring for GPU access.
  • Update mpv invocation defaults and unit tests to assert Wayland GL VO usage across all Qt6 device types; adjust FAQ entries to reflect the Wayland compositor in the stack.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
website/data/faq.yaml Updates FAQ wording to reflect the new cage/Wayland-based viewer stack.
tools/image_builder/utils.py Consolidates Qt6 viewer apt deps to include cage and qt6-wayland across boards.
docker/Dockerfile.viewer.j2 Gates QT_QPA_PLATFORM=wayland on is_qt6 for all Qt6 images.
bin/start_viewer.sh Runs viewer under cage for all Qt6 boards and mirrors host render-node GID.
src/anthias_viewer/media_player.py Switches mpv VO to Wayland GL (--vo=gpu --gpu-context=wayland) and adjusts Pi tuning.
tests/test_media_player.py Updates/extends assertions to cover Wayland VO on all Qt6 device types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +187 to +191
# straight on KMS/DRM, with mpv on --vo=drm. Qt6 boards (pi4-64,
# pi5, x86, arm64) run the viewer under `cage` (a kiosk wlroots
# compositor) with QT_QPA_PLATFORM=wayland and mpv on --vo=gpu
# --gpu-context=wayland — no X code path on either track. The
# cage + qt6-wayland pair is added to the Qt6 apt extension below.
# Covers x86 (balenaOS doesn't expose /dev/fb0), arm64 Armbian
# boards (Rock Pi / Orange Pi / Banana Pi, …), and the Pi4-64 / Pi5
# (consolidated onto Wayland so all Qt6 boards share one display
# stack — see #TBD).
Comment thread website/data/faq.yaml
- question: How do I rotate the screen for portrait orientation?
answer: |
Rotation happens at the kernel / firmware level. The Anthias viewer renders straight to the Linux framebuffer (Qt `linuxfb`) and to KMS (mpv `--vo=drm`) — there's no Wayland compositor in the stack — so the standard Raspberry Pi config knobs apply directly.
Rotation happens at the kernel / firmware level on the Pi. The viewer renders through `cage` (a kiosk Wayland compositor that talks straight to KMS) on Pi4/Pi5/x86/arm64, and through Qt `linuxfb` directly on legacy 32-bit Pi boards (Pi2/Pi3) — neither path goes through a desktop compositor, so the standard Raspberry Pi config knobs apply directly.
@vpetersson
Copy link
Copy Markdown
Contributor Author

On-device validation blocked the Pi4-64/Pi5 consolidation

Tested the rebuilt viewer image on a real Pi4 Model B (Debian Trixie, 4K display via HDMI). Same 1080p H.264 asset, same 30-s window, frame drops via mpv \${frame-drop-count}:

Path mpv flags Drops / 30s
Current production --vo=drm --drm-mode=1920x1080@60 --vd-lavc-threads=4 --hwdec=auto-safe 59–75
GPU output, no cage --vo=gpu --gpu-context=drm --hwdec=auto-safe 3–6
This PR (cage + Wayland) --vo=gpu --gpu-context=wayland --vd-lavc-threads=4 --hwdec=auto-safe 738+, slow-motion playback

decoder-frame-drop-count was 0 in every run — the regression is on the VO side.

Why this regresses on Pi

  1. The Pi V3D can't composite 4K in real time on top of a separate Wayland surface. cage selects the connector's preferred mode (4K on this TV) and mpv renders into a Wayland surface that cage composites and scans out. The current --drm-mode=1920x1080@60 pin sidesteps this by running the connector at 1080p; under cage that flag is a no-op (cage holds DRM master).
  2. mpv 0.40 in Debian Trixie has no v4l2request hwdec — only v4l2m2m-copy, which falls back to software decode under cage in this image (hwdec=no confirmed via \${hwdec-current}). No DMA-BUF zero-copy path is available.
  3. --vo=dmabuf-wayland is not viable — it would zero-copy from decoder to wayland surface, but it has no hwdec to receive from (see point 2), and the existing code comment notes it segfaults under cage on x86 anyway.

Options

  • (A) Drop Pi4-64/Pi5 from this PR. Keep x86+arm64 on cage+Wayland (already there from feat(install): generic-arm64 best-effort support (Armbian SBCs) #2879). Effectively close this PR.
  • (B) Force a 1080p output mode on Pi via kernel cmdline (video=HDMI-A-1:1920x1080@60 in /boot/firmware/cmdline.txt), so cage runs at 1080p and the V3D has the same upscale headroom it had with --drm-mode. Requires an Ansible/install change to ship.
  • (C) Wait for newer mpv (v4l2request hwdec + drm-prime interop) and revisit.

Leaning toward (A) — the consolidation goal hinged on Pi being able to keep up, and on this hardware + mpv combo it can't. Marking the PR draft pending direction.

Side-finding worth keeping

Even without cage, just switching the Pi --vo=drm--vo=gpu --gpu-context=drm (offload scaling from CPU zimg to V3D) cuts drops from 59–75 → 3–6 on this Pi4. That's a separate small win worth landing on its own if we don't ship the full consolidation.

(Tested on a 4K-connected Pi4-64 Rev 1.5; the Pi5 has roughly 2× the V3D throughput so it might just squeak through, but I'd want the same test there before assuming.)

@vpetersson vpetersson marked this pull request as draft May 12, 2026 22:35
vpetersson added a commit that referenced this pull request May 13, 2026
…4 to 1080p

Folds in PR #2883: Pi 4-64 / Pi 5 now run under cage with mpv on
--vo=gpu --gpu-context=wayland, joining x86 and arm64 on a single
Wayland-based display stack. Drops the --vo=drm legacy path
entirely from MPVMediaPlayer. Qt 5 boards (pi2 / pi3) stay on
linuxfb via VLCMediaPlayer — out of scope here.

Replaces the perf branch's `--vo=gpu --gpu-context=drm` standalone
fix with the consolidated cage path. The previous standalone
finding (3-6 vo drops / 30 s on Pi 4 at 4K) was a Pi-without-cage
optimization; once Pi runs under cage like every other Qt6 board,
the same trick applies via wayland but cage's composite step adds
its own pass and the V3D on Pi 4 can't keep up at 4K (738 vo
drops / 30 s measured at native 4K under cage). Fix: move the
1080p mode pin one layer up from app code to host config — the
new ansible/.../cmdline.txt.j2 conditional appends
`video=HDMI-A-1:1920x1080@60 video=HDMI-A-2:1920x1080@60` when
`device_type == 'pi4-64'`. With output pinned to 1080p there's no
upscale anywhere in the pipeline, matching the bandwidth profile
of today's --vo=drm production setup.

Pi 5 / x86 / arm64 keep the connector's preferred mode (typically
4K). Pi 5's V3D 7.1 has roughly 2× Pi 4's throughput; x86 iGPUs
handle 4K via VAAPI; arm64 SBC perf varies by SoC.

Other notable changes folded in from #2883:

* tools/image_builder/utils.py — `cage` + `qt6-wayland` move out
  of the per-board branch into the shared is_qt6 block.
  `wlr-randr` (was x86-only) goes in the shared block too since
  rotation now happens via wlr-randr on every Qt6 board.
  `va-driver-all` stays x86-only (no VAAPI on Pi / ARM SoCs).
* docker/Dockerfile.viewer.j2 — QT_QPA_PLATFORM=wayland gated on
  is_qt6 instead of board in ('x86', 'arm64').
* bin/start_viewer.sh — case on DEVICE_TYPE: every Qt6 board
  takes the cage + sudo path. Pi2 / Pi3 stay on the legacy
  direct-sudo path.
* src/anthias_viewer/media_player.py — single --vo=gpu
  --gpu-context=wayland for all reachable device types. The
  per-board rotate_args block is gone: every Qt6 device inherits
  the transform from cage via wlr-randr, so mpv would
  double-rotate if it set --video-rotate.
* tests/test_media_player.py — parametrised tests for all four
  Qt6 boards (x86, arm64, pi4-64, pi5) hitting the same VO path;
  rotation tests assert mpv *never* sets --video-rotate under
  cage.
* website/data/faq.yaml — rotation entry points at Settings page
  / wlr-randr; resolution entry calls out the Pi 4 1080p pin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson
Copy link
Copy Markdown
Contributor Author

Closing. Folded into #2885, which now consolidates Pi 5 / x86 / arm64 onto cage + Wayland (Pi 4 stays on linuxfb — on-device testing showed the V3D 6.0 can't keep up with cage's composite pass on top of mpv, even with a 1080p HDMI mode pin).

The base of this PR (worktree-jiggly-sauteeing-stonebraker) merged to master as #2879, so the cage path is already live for x86 / arm64. #2885 picks up Pi 5 (and a Pi 4 perf improvement on the linuxfb path) on top of it.

@vpetersson vpetersson closed this May 13, 2026
vpetersson added a commit that referenced this pull request May 18, 2026
…oad (#2885)

* perf(viewer): pi4-64/pi5 use mpv --vo=gpu --gpu-context=drm

On Pi the connector's preferred mode is usually 4K (most modern
TVs report 3840x2160 in their EDID), and the previous --vo=drm
path ran a CPU zimg upscale from 1080p source to that 4K output.
On a 4-core A72 that's the bottleneck — mpv VO drops 59-75
frames per 30s on a stock 1080p H.264 signage clip. Pi5's A76
is faster but the same upscale path is still the limit.

Switching the VO to GL with the DRM context (mpv --vo=gpu
--gpu-context=drm) hands the upscale to the V3D and leaves
everything else identical — mpv still owns DRM master, still
reads --drm-mode=1920x1080@60 (kept), still runs in
--vd-lavc-threads=4 software decode (mpv 0.40 in Debian Trixie
has v4l2m2m-copy but not v4l2request, so --hwdec=auto-safe
falls back to software on this asset; that hasn't changed).

Measured on a 4K-connected Pi4-64 Rev 1.5, same clip, same 30 s
window:

  --vo=drm                                : 59-75 vo drops / 30 s
  --vo=gpu --gpu-context=drm (this patch) : 3-6 vo drops / 30 s

`decoder-frame-drop-count` is 0 in both — the regression was
purely on the VO side, and shifting scaling off the CPU is what
buys the headroom.

x86 (cage + --gpu-context=wayland) is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(viewer): drop --drm-mode pin on Pi4-64/Pi5 under --gpu-context=drm

The previous commit moved Pi4-64/Pi5 to `mpv --vo=gpu
--gpu-context=drm` but kept the `--drm-mode=1920x1080@60` pin
from the old --vo=drm path. On-device testing showed the pin
*hurts* throughput under GBM: 294 vo drops/30s with the pin,
3-6 without, on the same 4K-connected Pi4 and the same H.264
clip.

The pin existed in the first place to dodge CPU zimg upscale to
4K, which the A72 couldn't keep up with on the legacy --vo=drm
path. Under --gpu-context=drm the V3D does the scaling for free
at the connector's preferred mode, so the workaround is no
longer needed and is in fact harmful.

`--vd-lavc-threads=4` stays — software decode under
--hwdec=auto-safe (mpv 0.40 has v4l2m2m-copy but not
v4l2request) still benefits from explicit threading.

Verified on a 4K-connected Pi4-64 across H.264 (30/24 fps) and
HEVC clips: 2-6 vo drops/30s in every case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(viewer): consolidate Qt6 boards onto cage + Wayland, pin Pi 4 to 1080p

Folds in PR #2883: Pi 4-64 / Pi 5 now run under cage with mpv on
--vo=gpu --gpu-context=wayland, joining x86 and arm64 on a single
Wayland-based display stack. Drops the --vo=drm legacy path
entirely from MPVMediaPlayer. Qt 5 boards (pi2 / pi3) stay on
linuxfb via VLCMediaPlayer — out of scope here.

Replaces the perf branch's `--vo=gpu --gpu-context=drm` standalone
fix with the consolidated cage path. The previous standalone
finding (3-6 vo drops / 30 s on Pi 4 at 4K) was a Pi-without-cage
optimization; once Pi runs under cage like every other Qt6 board,
the same trick applies via wayland but cage's composite step adds
its own pass and the V3D on Pi 4 can't keep up at 4K (738 vo
drops / 30 s measured at native 4K under cage). Fix: move the
1080p mode pin one layer up from app code to host config — the
new ansible/.../cmdline.txt.j2 conditional appends
`video=HDMI-A-1:1920x1080@60 video=HDMI-A-2:1920x1080@60` when
`device_type == 'pi4-64'`. With output pinned to 1080p there's no
upscale anywhere in the pipeline, matching the bandwidth profile
of today's --vo=drm production setup.

Pi 5 / x86 / arm64 keep the connector's preferred mode (typically
4K). Pi 5's V3D 7.1 has roughly 2× Pi 4's throughput; x86 iGPUs
handle 4K via VAAPI; arm64 SBC perf varies by SoC.

Other notable changes folded in from #2883:

* tools/image_builder/utils.py — `cage` + `qt6-wayland` move out
  of the per-board branch into the shared is_qt6 block.
  `wlr-randr` (was x86-only) goes in the shared block too since
  rotation now happens via wlr-randr on every Qt6 board.
  `va-driver-all` stays x86-only (no VAAPI on Pi / ARM SoCs).
* docker/Dockerfile.viewer.j2 — QT_QPA_PLATFORM=wayland gated on
  is_qt6 instead of board in ('x86', 'arm64').
* bin/start_viewer.sh — case on DEVICE_TYPE: every Qt6 board
  takes the cage + sudo path. Pi2 / Pi3 stay on the legacy
  direct-sudo path.
* src/anthias_viewer/media_player.py — single --vo=gpu
  --gpu-context=wayland for all reachable device types. The
  per-board rotate_args block is gone: every Qt6 device inherits
  the transform from cage via wlr-randr, so mpv would
  double-rotate if it set --video-rotate.
* tests/test_media_player.py — parametrised tests for all four
  Qt6 boards (x86, arm64, pi4-64, pi5) hitting the same VO path;
  rotation tests assert mpv *never* sets --video-rotate under
  cage.
* website/data/faq.yaml — rotation entry points at Settings page
  / wlr-randr; resolution entry calls out the Pi 4 1080p pin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ansible): propagate tags into boot.yml include_tasks

The `Configure boot partition` task in system/tasks/main.yml was
tagged `touches-boot-partition` / `raspberry-pi` but those tags
weren't propagated to the tasks inside boot.yml — Ansible's
default include_tasks behaviour matches the include against
--tags but leaves the included tasks tag-less, so they get
filtered back out. Running `ansible-playbook ... --tags
touches-boot-partition` therefore did nothing.

Use the explicit `apply: tags:` form so the include's tags are
copied onto each task in boot.yml. With this, the standalone
"re-render boot config" workflow actually works, which matters
on Pi 4 now that the 1080p HDMI mode pin in cmdline.txt.j2
needs to land without re-running the whole playbook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): keep Pi 4 on linuxfb; only Pi 5 / x86 / arm64 go cage

On-device testing on a Pi 4 Model B Rev 1.5 with a 4K HDMI display
showed cage+wayland is fundamentally too heavy for the V3D 6.0:

  --vo=drm    (existing, no cage)                : 59-75 drops/30s
  --vo=gpu --gpu-context=drm  (no cage, GPU scale): 3-6 drops/30s
  --vo=gpu --gpu-context=wayland (cage, even at  : 730+ drops/30s,
    1080p HDMI cmdline pin to avoid 4K scale)      mpv at 99% CPU
                                                   running ~1/4×
                                                   real time

The 1080p HDMI pin doesn't recover Pi 4 — cage's composite pass
costs more than the V3D 6.0 has spare bandwidth for, regardless
of output resolution, with the webview running in the background
or not. Pi 5's V3D 7.1 has roughly 2× the throughput and is
expected to keep up; x86 / arm64 already shipped on cage and
remain unchanged.

Net result:

  * Pi 4-64 stays on Qt linuxfb (no compositor) with mpv on
    --vo=gpu --gpu-context=drm. mpv writes straight to KMS via
    libgbm and lets the V3D do video scaling — keeping the
    standalone perf-branch finding that drops from 59-75 → 3-6
    on the same clip.
  * Pi 5 / x86 / arm64 stay (or move) onto cage + qt6-wayland +
    wlr-randr with mpv on --vo=gpu --gpu-context=wayland.
  * Pi 2 / Pi 3 stay on the Qt5 + VLC + linuxfb track they were
    already on.
  * The Pi 4 1080p HDMI cmdline pin added in the previous commit
    is reverted (no longer needed without cage).
  * Rotation handling: mpv emits --video-rotate=N on Pi 4 (no
    compositor to apply the transform) and skips it on the cage
    boards (wlr-randr handles it there).

Goal-wise this is the partial-consolidation we agreed to as last
resort: three of four Qt6 boards share one Wayland stack, Pi 4
keeps the framebuffer path for as long as the V3D 6.0 + mpv 0.40
combo lacks the headroom. Pi 4 remains in scope for revisiting
once mpv ships the v4l2request hwdec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): mirror host render-GID for all Qt 6 boards, not just cage

mpv uses /dev/dri/renderD128 for --vo=gpu on every Qt 6 board
now — wayland (cage path on x86 / arm64 / pi5) and drm (linuxfb
path on Pi 4) both go through Mesa GL. The render-GID mirror was
inside the cage branch of start_viewer.sh, so Pi 4's mpv ran as
viewer user, hit the render node owned by GID 992, got
"Permission denied", and bailed with "Failed initializing any
suitable GPU context!".

Hoist the render-GID setup above the per-board case so it runs
for every Qt 6 board. cage / linuxfb branching stays as-is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): Pi 4 stays on --vo=drm (Qt linuxfb DRM master contention)

Earlier commits switched Pi 4 to mpv --vo=gpu --gpu-context=drm
based on a 3-6 vo-drop/30 s measurement. That test was run as
root in a fresh container — no Qt linuxfb in the picture. In
the production viewer where AnthiasWebview holds the framebuffer
via Qt linuxfb, --vo=gpu fails:

  failed to open /dev/dri/renderD128: Permission denied
  [vo/gpu/drm] Failed to acquire DRM master: Permission denied
  [vo/gpu] Failed initializing any suitable GPU context!
  Error opening/initializing the selected video_out (--vo) device.
  Video: no video

Mesa GBM holds DRM master persistently and contends with Qt
linuxfb's framebuffer use. mpv's classic --vo=drm has its own
master juggling (briefly grab → render → drop) that coexists
fine with linuxfb — that's why master's existing Pi 4 config
works.

Revert Pi 4 mpv flags to the production master config:
  --vo=drm --drm-mode=1920x1080@60 --vd-lavc-threads=4

The standalone perf-finding from this branch's earlier history
turns out not to apply in production; retracted from the
roll-up. Pi 5 / x86 / arm64 unchanged (they're on cage +
--vo=gpu --gpu-context=wayland, which has its own DRM master
flow via cage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): cage opens on the first connected connector, not HDMI-A-1

Without `-o`, cage uses whatever output the DRM backend enumerates
first — typically HDMI-A-1 on Pi 5 (closer to USB-C) and the
on-board panel / first HDMI on x86 / arm64. If the operator plugs
into the *other* port (Pi 5 HDMI-A-2, or any DP connector on
x86), cage renders to a disconnected connector and the screen
stays black.

start_viewer.sh now iterates /sys/class/drm/card*-*, picks the
first connector whose status reads "connected", strips the
cardN- prefix to get the bare name cage expects (HDMI-A-1,
HDMI-A-2, DP-1, eDP-1, …), and passes it via `-o`. Falls back to
letting cage pick if nothing is connected yet — the display may
come up via HPD after cage starts, or this is a build/CI host
with no display at all.

Caught while end-to-end testing on the rig: Pi 5 cable on
HDMI-A-2 went to a black screen even though `cat
/sys/class/drm/card1-HDMI-A-2/status` reported "connected" and
cage / the viewer were running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(viewer): mpv from apt.raspberrypi.com on Pi 4 / Pi 5, hwdec auto-copy

Stock Debian Trixie's mpv 0.40 is compiled without `v4l2request`
hwdec, so Pi 5's Hantro stateless decoder is invisible to it and
mpv falls back to software decode for every H.264 / H.265 source.
Pi 4's V4L2 M2M decoder is reachable via `v4l2m2m-copy` but mpv's
`--hwdec=auto-safe` whitelist explicitly excludes that method, so
auto-detect picked software there too.

Two changes, applied together because they only make sense
together:

* Pi 4 / Pi 5 viewer images now pull mpv (and the FFmpeg library
  family it depends on) from `archive.raspberrypi.com/debian
  trixie main`. The Pi-tuned build ships `v4l2request` hwdec
  (Pi 5) and a maintained `v4l2m2m-copy` (Pi 4). An apt-pin
  restricts the Pi repo to the mpv + libav* packages only, so
  curl / ca-certificates / etc. continue to come from stock
  Debian and the rest of the image stays on the same baseline.
* `MPVMediaPlayer.play()` switches `--hwdec=auto-safe` →
  `--hwdec=auto-copy`. auto-copy is the same family but with a
  broader whitelist that *includes* the v4l2-family copy hwdecs.
  Net effect: x86 still picks vaapi-copy (unchanged), Pi 4 picks
  v4l2m2m-copy, Pi 5 picks v4l2request, arm64 falls through to
  software (no v4l2request in stock Debian mpv, no vendor-tuned
  Rockchip plugin in stock either — Tier-2 follow-up).

Plus an `ANTHIAS_DEBUG_DROPS=1` env knob: when set on the viewer
container, mpv's stdout/stderr go to `/data/.anthias/mpv.log`
(host-bound) instead of `/dev/null`, and `--no-terminal` is
dropped so the status line ("AV: ... Dropped: N") is emitted.
Lets us read per-asset frame-drop counts straight from the
production viewer pipeline (no custom harness, no rebuild)
during the test-grid runs. Default (unset) preserves the silent
behaviour.

Also: drops the `cage -o <connector>` autodetect attempt — cage
0.1.x in Trixie doesn't accept `-o`, just `-m last`. Use that
instead so cage opens on the most-recently-connected output
regardless of HDMI-A-N enumeration order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): use deb-packaged Pi keyring for archive.raspberrypi.com

apt update against http://archive.raspberrypi.com/debian trixie
was failing in the Pi 4 / Pi 5 viewer image builds:

  Sub-process /usr/bin/sqv returned an error code (1):
  Signing key on CF8A1AF502A2AA2D763BAE7E82B129927FA3303E is not
  bound: No binding signature at time …
  Policy rejected non-revocation signature (PositiveCertification)
  requiring second pre-image resistance
  SHA1 is not considered secure since 2026-02-01

Pi's bare `raspberrypi.gpg.key` URL still serves the original
2012-vintage RSA 2048 key with SHA1 binding signatures that
Trixie's sqv refuses to certify under the post-2026-02-01
crypto policy. The deb-packaged keyring inside
`raspberrypi-archive-keyring_2025.1+rpt1_all.deb` ships the
*same* key fingerprint but with rebuilt binding signatures
that sqv accepts — that's the keyring Pi OS Trixie itself
installs, which is why `apt update` against this exact repo
works on a real Pi 5 device today.

Fetch the deb directly with curl, extract its bundled
`.pgp` keyring, and point `signed-by=` at the installed copy.
The pin block restricts what packages the Pi repo can supply
(mpv + libav* + ffmpeg + libpostproc — the FFmpeg family),
so the rest of the image keeps its stock-Debian baseline.

Also extend the pin to cover libpostproc* and ffmpeg, since
mpv's apt deps drag those into the Pi-tagged version on
install; without the pin extension, apt rejected the resolve
with "broken packages".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(viewer): per-codec hwdec on Pi via Lua hook

mpv 0.40's `--hwdec` accepts a single value at startup, so we
can't ask it to try v4l2m2m-copy for H.264 *and* drm-copy for
HEVC out of the box. The Pi-tuned mpv from
archive.raspberrypi.com supports both hwdec methods but each
covers a different codec subset:

* v4l2m2m-copy — Pi 4's V3D V4L2 M2M decoder. H.264 works; Pi
  5's Hantro G2 is V4L2-stateless-only so this no-ops there.
* drm-copy — FFmpeg's `v4l2_request_hevc` hwaccel. HEVC only,
  works on both Pi 4 and Pi 5.

Add a small `on_load` Lua hook (inlined as `_PI_HWDEC_LUA`,
written to /tmp on first play(), loaded with `--script=`) that
checks `video-codec-name` and picks the right hwdec at file
open. Net effect:

  Pi 4 H.264 → v4l2m2m-copy   (HW)
  Pi 4 HEVC  → drm-copy       (HW)
  Pi 5 H.264 → v4l2m2m-copy   (no device, falls back to SW
                                — only path until mpv re-adds
                                v4l2_request_h264 hwdec)
  Pi 5 HEVC  → drm-copy       (HW)

The base `--hwdec=auto-copy` startup value still applies on
x86 / arm64 (vaapi-copy on Intel/AMD; software fall-back on
Rockchip), where the hook isn't loaded.

Verified on real hardware:
  $ mpv ... --script=/tmp/anthias-pi-hwdec.lua test_hevc.mp4
  [pi-hwdec] codec=hevc -> hwdec=drm-copy
  Using hardware decoding (drm-copy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer,server): HW-decode everywhere on Pi 4 / Pi 5 / x86

The previous per-codec Lua hook in media_player.py was a silent no-op:
mpv's video-codec-name property is empty at every script event before
hwdec init (on_load, on_preloaded), so --hwdec=auto-copy leaked through.
auto-copy's upstream whitelist excludes v4l2m2m-copy, so H.264 on Pi 4
fell back to software despite the V3D V4L2 M2M decoder being available.

Viewer (src/anthias_viewer/media_player.py)

- Replace the Lua hook with ffprobe-driven dispatch from Python at
  launch time. ffprobe is in the viewer image; the call is ~50 ms.
- Per-board mapping: Pi 4 → {h264: v4l2m2m-copy, hevc: drm-copy};
  Pi 5 → {hevc: drm-copy}. Pi 5 H.264 falls back to auto-copy
  because mpv has no v4l2-request H.264 hwdec for the Hantro G1,
  and passing v4l2m2m-copy there just logs "Could not find a valid
  device" before SW-falling-back.
- Live-verified on Pi 4: "Using hardware decoding (v4l2m2m-copy)"
  for 1080p H.264 and "Using hardware decoding (drm-copy)" for
  HEVC at 1080p and 4K.

Asset processor (src/anthias_server/processing.py)

- Pi 5 profile drops H.264 from passthrough_video_codecs — Pi 5
  has no mpv H.264 HW path, so H.264 uploads must transcode to HEVC
  at upload time to keep the HW-decode-everywhere contract.
- Pi 4 profile adds passthrough_video_max_pixels for H.264, capped
  at 1080p (1920*1080). 4K H.264 clears the codec gate but the V3D
  H.264 envelope tops at 1080p60, so the cap forces it through a
  libx265 re-encode at upload time. HEVC keeps no cap (the
  dedicated HEVC block handles 4Kp60).
- _ffprobe_summary now returns video_pixels alongside codec /
  container / audio_codec; _video_can_passthrough enforces the
  per-codec pixel cap when the profile declares one.

Tests

- test_media_player.py: new per-board hwdec tests (Pi 4 H.264 →
  v4l2m2m-copy; Pi 5 H.264 → auto-copy; both → drm-copy for HEVC;
  auto-copy fallback when ffprobe fails; no probe on x86 / arm64).
- test_processing.py: matrix tests updated to include video_pixels;
  parametrised rows now exercise Pi 5 H.264-no-passthrough and the
  Pi 4 4K H.264 cap. New end-to-end tests prove
  _run_video_normalisation transcodes Pi 5 H.264 → HEVC and Pi 4
  4K H.264 → HEVC.

Docs (docs/board-enablement.md, new)

- Goal + per-board HW-decode capability table.
- Asset processor codec policy spelled out as a contract.
- BBB test bed recipe (source clips, libx265 transcode commands,
  ANTHIAS_DEBUG_DROPS=1, mpv.log slicing).

Follow-up: Pi 5 4K HEVC HW

The Hantro G2 decoder can't allocate 4K dst buffers from Pi 5's
default 64 MB CMA ("v4l2_request_hevc_start_frame: Failed to get
dst buffer") and SW-falls-back. Adding cma=512M to the kernel
cmdline does NOT work — the kernel takes the cmdline value over
the device-tree linux,cma node, orphaning rpi-hevc-dec ("Failed
to probe hardware -517") and unpopulating /dev/video*, which
kills HEVC HW at every resolution. The right fix is a
dtparam/dtoverlay in /boot/firmware/config.txt that resizes the
existing DT-declared region without orphaning the codec's
reserved-mem reference. Until that lands, the pi5 profile should
downscale 4K → 1080p HEVC. Documented in cmdline.txt.j2 and
docs/board-enablement.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(viewer,server): mock _probe_video_codec; fix mypy on Popen IO types

CI failures on the previous commit (bb27b186) came from:

* ``subprocess.run`` inside ``_probe_video_codec`` blowing up under
  the existing ``mpv`` fixture, which patches ``subprocess.Popen``
  to a MagicMock. ``subprocess.run`` internally instantiates Popen
  for the ffprobe shellout, gets a MagicMock back, then trips on
  unpacking communicate()'s result. Fixed by default-mocking
  ``_probe_video_codec`` in the fixture (returns '' so dispatch
  falls back to 'auto-copy', preserving legacy assertions) and
  layering the same mock onto the standalone rotation tests that
  build MPVMediaPlayer outside the fixture.

* ``ruff format``: the multi-line ffprobe arg list in
  ``_probe_video_codec`` needed splitting one-arg-per-line.

* ``mypy``: typing the popen_stdout / popen_stderr locals as
  ``object`` couldn't satisfy any Popen overload. Switched to
  ``int | IO[bytes]`` which covers both the DEVNULL / STDOUT
  sentinels and the bind-mounted mpv.log file handle.

* ``test_passthrough_containers_match_real_ffprobe_format_names``
  was pinned to the pi5 profile to exercise the H.264 + HEVC
  passthrough path; pi5 no longer passthroughs H.264, and the
  fake summary it constructs has no width/height (so pi4-64's
  cap fails it too). Switched the pin to x86, which has no
  per-codec caps — the test is about *container* recognition, not
  codec/resolution gating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): downscale 4K HEVC → 1080p on Pi 5 (CMA workaround)

Pi 5's Hantro G2 HEVC decoder is rated for 4Kp60 but the stock 64 MB
CMA on Pi OS can't fit a 4K HEVC dst-buffer pool — at 4K mpv hits
``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and
silently SW-falls-back. Bumping cma= on the kernel cmdline orphans
``rpi-hevc-dec`` entirely (the kernel takes the cmdline value over
the device-tree linux,cma node, leaving the driver returning
``Failed to probe hardware -517``), so the kernel-side knob isn't
available without a dtoverlay change.

Until that follow-up lands, the asset processor caps Pi 5 HEVC at
1080p both ways:

* ``passthrough_video_max_pixels`` gates 4K HEVC uploads out of
  passthrough — anything wider than 1920×1080 falls through to a
  re-encode.
* New ``transcode_video_max_pixels`` per-codec field tells
  ``_transcode_to_target`` to emit a
  ``-vf scale='if(gt(ih,1080),-2,iw)':'min(ih,1080)'`` filter that
  caps height at the 16:9 budget (cap_h = floor(sqrt(cap × 9/16))).
  Portrait 4K → 1080p height; landscape 4K → 1920×1080. Sub-1080p
  sources are untouched (the ``min()`` guard prevents upscale; ``-2``
  on width keeps libx265 happy with even dimensions).

Pi 4 / x86 don't carry the cap (their HW decoders handle 4Kp60
cleanly), so the filter stays absent from those profiles.

Tests cover (a) the new pi5+hevc+4K row in the parametrised
passthrough matrix (False at 4K, True at 1080p), (b) ffmpeg argv
shape: -vf scale=... emitted for pi5 HEVC, absent for pi4-64 HEVC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer,system): Pi 5 4K HEVC HW + display-resampled VO sync

Two tied changes that move every supported board to clean HW
decode at the source's actual framerate.

Pi 5 4K HEVC via cma-512
------------------------

Pi OS for Pi 5 reserves 64 MB of CMA by default. The Hantro G2
HEVC decoder needs a buffer pool large enough to hold several 4K
dst frames (each ~12 MB) plus reference frames, so the stock
allocation can fit 1080p HEVC but not 4K — at 4K mpv hits
``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and
silently SW-falls-back.

Adding ``cma=512M`` to /boot/firmware/cmdline.txt does NOT work:
the kernel takes the cmdline value over the device-tree
``linux,cma`` node, which orphans ``rpi-hevc-dec`` entirely
(returns ``Failed to probe hardware -517`` and ``/dev/video*``
disappears, killing HEVC HW at every resolution).

The Pi-OS-blessed merge is ``dtoverlay=vc4-kms-v3d,cma-512`` in
/boot/firmware/config.txt — the v3d overlay carries its own
``cma-N`` parameter that resizes the DT linux,cma node in place
without orphaning the codec driver. A standalone
``dtoverlay=cma,cma-512`` silently no-ops on Pi 5 because the
v3d overlay initialises the CMA region first; reusing the v3d
overlay's parameter is the documented way to merge them.

ansible/roles/system/templates/config.txt.j2 now emits the
``,cma-512`` parameter on Pi 5 only — Pi 4 already gets 512 MB
CMA by default so the override is a no-op there. The earlier
attempt at a kernel-cmdline cma= override (in cmdline.txt.j2) is
removed; the file's comment now points readers at the correct
config.txt path.

Live-verified on Pi 5: CmaTotal=512MB after the overlay change,
/dev/video* present, rpi-hevc-dec probes cleanly. Asset processor
pi5 profile no longer carries a HEVC pixel cap — Pi 5 can decode
HEVC at its silicon's real capability.

mpv --video-sync=display-resample
---------------------------------

mpv 0.40 defaults to ``--video-sync=audio`` which syncs the video
clock to the audio clock and drops VO frames when the two drift.
On every board tested (Pi 4 --vo=drm, Pi 5 + x86 --vo=gpu
--gpu-context=wayland) this produced 60–90% VO drops at 60 fps
content even when the decoder reported healthy HW decode
(``Using hardware decoding (...)`` banner present, no decoder
errors). The drops were at the VO, not the decoder.

``--video-sync=display-resample`` flips the relationship: sync
video to the display refresh and resample audio to match. Audio
resampling is a <1% CPU 2-channel job and most signage clips
have no audible content anyway, so it's effectively free; the
benefit is clean playback at the source's frame rate.

Test bed touched
----------------

* test_play_invokes_popen_with_expected_args_on_pi4_64: argv
  now includes ``--video-sync=display-resample``.
* test_video_can_passthrough_respects_board_codec_set: pi5 +
  hevc + 4K is now ``True`` (passthrough) because the CMA fix
  lets the silicon do its rated job. Comment updated to point
  at config.txt.j2.
* Removed the transient downscale-on-Pi 5 codepath
  (``transcode_video_max_pixels`` field, the
  ``-vf scale='if(gt(ih,...))':...`` filter, and the two tests
  asserting it) — that was a workaround for the CMA issue and
  is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): introduce PlaybackEnvelope dataclass + matrix + cache

Foundation for the per-board playback envelope rollout (see
/home/ubuntu/.claude/plans/serene-munching-gem.md). No behaviour
change yet — wires up the canonical source of truth that
processing.py, celery_tasks.py's future re-render walker, and the
viewer's hwdec dispatch will all read from in the next commit.

src/anthias_server/playback_envelope.py (new)
---------------------------------------------

Frozen dataclass ``PlaybackEnvelope`` carrying codec / max_width /
max_height / max_fps plus a fixed ``container_ext = 'mp4'``.
``ENVELOPE_BY_DEVICE_TYPE`` maps every supported board:

* pi2 / pi3 / arm64 → H.264 1920x1080 30 (no HEVC silicon /
  no upstream mpv HW path)
* pi4-64 / pi5 / x86 → HEVC 3840x2160 60 (dedicated HEVC block
  or VAAPI; fleet uniformity so the same upload produces
  bit-identical variants on every board)

``compute_envelope()`` resolves the current process's envelope
from DEVICE_TYPE; unset / unknown / mixed-case / whitespace all
fall back to the conservative default (H.264 1080p30).

``load_cached()`` / ``save_cached()`` round-trip the envelope to
``~/.anthias/playback-envelope.json``. Cache corruption (missing
file, bad JSON, unsupported codec) returns ``None`` so the caller
recomputes and overwrites — a hand-edit that breaks the file
self-heals on next start. ``save_cached`` writes atomically via
temp-file + rename.

src/anthias_server/processing.py
--------------------------------

``_ffprobe_summary`` now returns ``video_fps`` alongside the
existing keys. The next commit (Phase 2) uses this to decide
whether to emit ``-r envelope.max_fps`` — the cap is one-way, so
sub-cap source rates pass through unchanged. r_frame_rate is
parsed as a rational ``num/den``; unparseable / zero-denominator
collapses to ``None`` so the caller treats source fps as
"unknown" and skips the gate.

tests
-----

* tests/test_playback_envelope.py (new): matrix coverage; unset /
  unknown / cased / whitespace inputs; cache round-trip; missing
  / corrupt JSON / invalid-payload recovery; atomic write
  (no leaked .tmp); container_ext invariant.
* tests/test_processing.py: positive video_fps cases (integer
  rates, NTSC drop-frame 30000/1001 + 60000/1001, bogus / no-slash
  / zero-denominator inputs); the two ``assert summary == { ... }``
  ffprobe-recovery tests now include the new ``video_fps: None``
  key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): envelope-driven asset processor with sibling-original

Refactor ``processing.py`` so every video upload produces a
variant matching the board's playback envelope while preserving
the source as a sibling ``.original.<ext>`` file. Rotation is now
gapless by construction — every variant on disk shares one codec /
max resolution / max fps per board, so the viewer's output mode
never has to switch mid-clip.

src/anthias_server/processing.py
--------------------------------

* Replace ``_BOARD_PROFILES`` + ``_resolve_board_profile`` +
  ``_PI4_H264_MAX_PIXELS`` + ``_BoardProfile`` typedef with
  ``compute_envelope()`` from the new ``playback_envelope`` module
  (landed in 0b6bea0c). One canonical source of truth for "what
  every variant on disk looks like".

* ``_ffprobe_summary`` now returns per-axis dimensions
  (``video_width``, ``video_height``) alongside the existing
  ``video_pixels`` total. The envelope check is per-axis so an
  ultrawide source (e.g. 5760×1080) gets caught by the width cap
  even though its total pixel count is below 4K's.

* ``_video_can_passthrough(summary, envelope)`` is the new
  contract: passthrough iff (a) container is mp4, (b) codec
  matches envelope.codec exactly, (c) both axes are within the
  envelope cap, (d) source fps is at-or-under envelope.max_fps,
  (e) audio is demuxer-compatible. Any None in source dims / fps
  bails to transcode (we don't gamble on unsized clips).

* ``_transcode_to_target(input, output, envelope=None,
  source_summary=None)`` emits the smallest set of flags that
  lands the output inside the envelope. ``-vf scale=...`` only
  when source > envelope on either axis; ``-r envelope.max_fps``
  only when source fps > cap. The fps cap is one-way — we never
  up-convert a sub-cap source. New helper
  ``_video_args_for_codec`` picks libx264 / libx265 from the
  envelope's codec.

* ``_run_video_normalisation`` reorganised around the sibling-
  original pattern:
  - Fresh upload / legacy asset: rename ``Asset.uri`` to
    ``<base>.original.<ext>`` (the source-preservation step).
  - Re-render: read from the existing ``.original.*`` sibling
    instead.
  - Re-probe from the (possibly new) source location.
  - Passthrough branch: copy source → variant slot bitwise
    (cross-device fleet sha256 stays equal).
  - Transcode branch: staging-file render with the existing
    atomic-replace contract.
  - Stamp ``metadata['original_uri']`` (path to sibling),
    ``metadata['envelope']`` (envelope dict the variant matches).
    ``metadata['transcode_target']`` kept as the
    ``envelope.codec`` duplicate for one release of back-compat
    with the serializer surface.

Tests
-----

* ``test_video_can_passthrough_decision_table`` recast against
  the H.264 1920×1080 30 default envelope. Each row tests one
  gate (codec / per-axis dim / fps / audio / unknowns / probe
  gaps) without overlap.
* ``test_video_can_passthrough_respects_envelope`` end-to-end:
  pin ``DEVICE_TYPE``, build a summary at the given
  (codec, w, h, fps), assert the verdict. Replaces the legacy
  ``..._respects_board_codec_set``.
* ``test_transcode_to_target_emits_scale_when_source_oversize``,
  ``..._emits_fps_clamp_when_source_fast``,
  ``..._omits_clamps_when_source_at_envelope``: pin the smallest
  ffmpeg flag set per source / envelope combination.
* ``_envelope_summary`` helper at the top of the file
  short-circuits the per-test summary construction.
* Mock signatures for ``_transcode_to_target`` updated to accept
  the new ``envelope`` / ``source_summary`` kwargs.
* ``test_resolve_board_profile_picks_target_codec_per_board``
  deleted — equivalent coverage is in tests/test_playback_envelope.py
  against ``compute_envelope`` directly.

Stale doc / comment references to ``_BOARD_PROFILES`` /
``_resolve_board_profile`` updated to point at
``playback_envelope.ENVELOPE_BY_DEVICE_TYPE`` /
``compute_envelope``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): re-render walker + startup envelope reconciler

* New celery task `regenerate_for_envelope_change`: walks
  `Asset.objects.filter(mimetype='video')` and queues
  `normalize_video_asset` for any row whose
  `metadata['envelope']` no longer matches the current envelope.
  Malformed payloads, missing keys, and per-row exceptions are
  logged but don't stop the walker.
* New `AnthiasAppConfig.ready` hook -> `app/startup.py:
  run_envelope_check`: compares cached vs computed envelope,
  persists fresh, dispatches the walker on mismatch. Short-circuits
  under `ENVIRONMENT=test` / `PYTEST_CURRENT_TEST` so pytest runs
  don't enqueue stray walkers. Celery dispatch failure is logged
  but non-fatal -- the cache is already saved, so the next start
  sees the new envelope on disk and recovers.
* Tests cover: skip-in-envelope, queue-stale, legacy migration
  (no envelope key), image-asset skip, force-requeue, malformed
  payload recovery, continue-after-per-row-failure, every
  hook code path (test short-circuit, no-cache, match, mismatch,
  dispatch failure, corrupt cache).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server): preserve `.original.<ext>` siblings during orphan sweep

The Celery ``cleanup`` task built its "referenced" set only from
``Asset.uri``. With sibling-original storage, the source bytes live
at ``metadata['original_uri']`` (e.g. ``<id>.original.mov``) while
``Asset.uri`` points at the playback variant (``<id>.mp4``). Without
this fix every video upload's ``.original.<ext>`` falls outside the
1h mtime guard once the variant lands and gets silently deleted on
the next hourly sweep — breaking the re-render walker as soon as
the envelope changes.

* ``cleanup``: union ``Asset.uri`` ∪ ``metadata['original_uri']``
  into the referenced set, tolerant of legacy rows with non-dict
  metadata.
* Tests cover the new claim path + the malformed-metadata
  fallback so a stray ``metadata=None`` row can't crash the sweep.

The upload-path serializer itself stays untouched: the existing
``rename(tmp, <id><ext>)`` lands the upload at a single path, and
``processing._run_video_normalisation`` handles the
rename-to-``.original.<ext>`` atomically on first run. No double-
write, no extra disk traffic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(server): cover sibling-original storage across normalisation paths

Adds five tests pinning the ``.original.<ext>`` + variant contract
that the envelope walker depends on:

* fresh upload → ``<id>.original.<src_ext>`` created next to
  ``<id>.mp4``; ``metadata['original_uri']`` + ``metadata['envelope']``
  populated.
* re-render → ``.original.<ext>`` is byte-identical across passes
  (sha256 compared before/after); the walker reads from it and
  never rewrites it.
* passthrough → both files exist even when the source already
  matches the envelope (``shutil.copyfile`` semantics, not rename).
* legacy migration → pre-rollout assets with no ``original_uri``
  key get renamed to ``.original.<ext>`` on first walker pass.
* dangling ``original_uri`` → falls back to treating ``asset.uri``
  as the source-to-preserve; no silent error, no lost variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(board-enablement): replace codec policy table with playback envelope

* board-enablement.md now documents the envelope matrix as the
  single source of truth shared by the asset processor, the
  re-render walker, and the viewer's hwdec dispatch. The legacy
  ``_BOARD_PROFILES`` / ``passthrough_video_codecs`` vocabulary has
  been removed -- it never matched what ``processing.py`` does
  post-envelope.
* Calls out the ``<id>.original.<src_ext>`` + ``<id>.mp4`` sibling
  layout, the metadata keys the walker reads, and the cross-board
  fleet sha256 expectation.
* Pi 5 CMA quote rewritten: the real fix is
  ``dtoverlay=vc4-kms-v3d,cma-512`` in config.txt, not a downscale
  workaround. Kernel cmdline ``cma=`` is documented as the broken
  path it actually is.
* Failure-mode list updated for envelope-driven dispatch (off-
  envelope variant, display refresh ceiling, walker storm on
  unwritable cache, sha256 fleet divergence).
* ``media_player.py`` comment block: updates the Pi 5 H.264 →
  auto-copy and HEVC → drm-copy comments to reference the playback
  envelope by name and point at the correct CMA fix (config.txt
  dtoverlay, not cmdline.txt).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): mypy on `_make_video_asset` + boolean is_enabled

* `dict` annotations get explicit `dict[str, Any]` parameters
  (Anthias's mypy config sets `disallow_any_generics`).
* `is_enabled=1` → `is_enabled=True` so the Asset field's bool
  type matches mypy's view of django-stubs models.
* Adds the missing ``typing.Any`` import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server,tests): envelope-aware container gate + startup hook safety

Run 1 of CI surfaced several issues in the envelope refactor:

* **MP4 family container detection.** ffprobe reports an MP4 file's
  ``format_name`` as ``mov,mp4,m4a,3gp,3g2,mj2`` (``mov`` first
  because the QuickTime/MP4 demuxer is one codepath). The envelope
  gate compared the source container to ``envelope.container_ext``
  by exact equality, so every MP4 upload was rejected at the
  container gate even though the bytes are exactly what we'd
  write. Adds ``_MP4_FAMILY_CONTAINERS`` and special-cases ``mp4``
  envelope to accept any synonym.
* **Celery workers were running ``run_envelope_check``.**
  ``celery_tasks.py`` top-level-calls ``django.setup()``, which
  fires ``AppConfig.ready`` in every process that imports it,
  including the celery worker -- the previous comment in ``apps.py``
  was wrong. Two writers race on the cache file and could
  double-queue the walker for a single envelope change. New
  ``_is_celery_worker()`` short-circuit detects the
  ``celery -A ... worker`` invocation via ``sys.argv[0]``.
* **Settings singleton captures HOME at init.**
  ``AnthiasSettings.home`` is set once at module import time, so
  ``monkeypatch.setenv('HOME', tmpdir)`` in tests doesn't reach the
  envelope cache helpers. Updates ``cache_dir`` and ``fake_home``
  fixtures to also patch ``settings.home`` via ``monkeypatch.setattr``.
* **Stale tests.**
  - Drop ``test_cleanup_tolerates_non_dict_metadata`` -- the schema
    enforces ``metadata`` as a non-null JSON dict, so the failure
    mode it claimed to test can't occur. ``cleanup()`` keeps the
    defensive ``isinstance(metadata, dict)`` check as a no-cost
    belt-and-braces.
  - ``test_video_passthrough_for_h264_or_hevc_in_known_containers``
    rewritten as ``test_video_passthrough_when_source_matches_board_envelope``
    -- the old matrix included libx264 on pi4-64 (no longer
    passthrough because pi4-64 is HEVC) and non-mp4 containers
    (always re-encoded now because the variant slot is fixed at
    ``.mp4``).
  - ``test_video_passthrough_records_target_codec`` switches the
    source codec to libx265 so it actually hits the passthrough
    branch on pi4-64.
  - ``test_video_passthrough_uses_summary_duration_no_second_probe``
    rebuilt via ``_envelope_summary`` so the synthesised summary
    carries the new ``video_width / video_height / video_fps``
    fields.
  - The two ``test_ffprobe_summary_handles_*`` early-return shape
    assertions add ``video_width`` / ``video_height`` to match the
    real return shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server,tests): drop PYTEST_CURRENT_TEST gate; align stale summaries

Run 2 of CI surfaced three more issues:

* **``PYTEST_CURRENT_TEST`` is not fixture-controllable.** pytest
  re-sets the env var at the start of every test's ``call`` phase,
  so ``monkeypatch.delenv`` in a ``setup`` fixture is overridden
  before the body runs. This made it impossible for any test to
  exercise the real startup hook path. The ``ENVIRONMENT=test``
  gate (set in ``conftest.py`` + the test compose file) is the
  durable, fixture-controllable signal — keep that, drop the
  pytest one. Test for the new ``_is_celery_worker`` short-circuit
  replaces the deleted ``test_short_circuits_when_pytest_current_test``.
* **Decision table parametrise had a wrong expectation.** Summary
  row "HEVC at envelope (codec, dims, fps all match)" was paired
  with ``expected=True``, but the test envelope is H.264 — codec
  mismatch must transcode, ``False``.
* **``test_video_passthrough_skips_duration_when_probe_unavailable``
  summary missed the new dim/fps fields.** Same root cause as
  before: ``_video_can_passthrough`` rejected the synthesised
  summary at the dims gate, the test fell through to a real
  ffmpeg call on a 64-byte stub, and ffmpeg "Invalid data found".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(envelope): add generic-arm64 key for Rock Pi / Armbian SBCs

The Anthias install path for Rock Pi 4 / Armbian boards writes
``DEVICE_TYPE=generic-arm64`` (see ``feat(install): generic-arm64
best-effort support``). The matrix only listed ``arm64``, so a
real install fell through to ``_DEFAULT`` — same envelope by
coincidence, but the walker would have logged "no matrix entry"
warnings on every server start and the docs/board-enablement
matrix would be subtly wrong about which key applies.

Lists the key explicitly with the same conservative H.264 1080p30
envelope and extends the parametrise coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server): make celery_tasks.py top-level django.setup() reentrant-safe

``django.setup()`` calls ``apps.populate()``, which raises
``RuntimeError: populate() isn't reentrant`` if invoked while
already populating. The new ``AnthiasAppConfig.ready`` hook imports
``celery_tasks`` to dispatch the walker, which until this change
top-level-called ``django.setup()`` again -- so on every real
server start the import died, the dispatch failed, and the walker
never ran. Live-confirmed on the Pi 4 test bed.

Check ``django.apps.apps.apps_ready`` before calling ``setup()``:
the flag flips to True after the import phase but before per-app
``ready`` hooks run, so the standalone celery worker (where Django
isn't initialised yet) still calls setup() as before, while the
server process (mid-populate) correctly skips the reentrant call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server): commit `original_uri` to DB before transcode (crash safety)

Live-confirmed on the Pi 4 test bed during the envelope rollout:
walker fired on a near-full SD card, ffmpeg ran out of space mid-
render, the on_failure hook cleared ``is_processing`` -- and the
hourly ``cleanup()`` sweep then silently deleted every
``.original.<ext>`` source it had just renamed, because
``Asset.uri`` still pointed at the (now-missing) variant path and
the orphan walker only knew about ``Asset.uri`` + a *committed*
``metadata['original_uri']``.

The metadata accumulator in ``_run_video_normalisation`` only wrote
to the DB at the end of the function, so any failure between
"rename source → .original.<ext>" and "render variant → atomic
replace" left the row's metadata stale.

Fix: persist ``metadata`` to the DB right after the rename, before
attempting any render. The contract becomes: if the file is on
disk under ``.original.<ext>``, the DB row knows it. ``cleanup()``
already reads ``metadata['original_uri']`` into the referenced set
(from ``fix(server): preserve `.original.<ext>` siblings during
orphan sweep``), so this commit closes the only window where that
guard could be bypassed.

Adds ``test_original_uri_persisted_before_render_for_crash_safety``
which mocks ``_transcode_to_target`` to raise and verifies the row
has ``metadata['original_uri']`` committed by the time the
exception propagates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(board-enablement): script-driven 1-minute sample pack

Previously the test pack was full-length BBB clips (~10 min) plus an
inline ffmpeg recipe in the docs that produced 4K HEVC re-encodes
taking ~30 min on a workstation. The on-device walker then had to
chew through the full-length variants, which on a Pi 4 / Rock Pi
turned a single rotation cycle into hours of wallclock for what was
really a hwdec-banner sanity check.

* New ``bin/generate_board_enablement_testbed.sh``: downloads the
  four BBB H.264 sources, trims each to 60 s with ``-c copy``
  (instant), then libx265-encodes each cut. Idempotent (skips
  files that already pass an ffprobe sanity check) and atomic
  (tmp-then-rename) so a power cycle mid-encode leaves a clean
  state.
* Pack drops from ~3.3 GB / 10 min per clip to ~350 MB / 60 s per
  clip. 60 s is enough to capture mpv's ``hwdec-current`` banner
  and read a stable ``Dropped:`` count, while keeping a full
  walker pass under a few minutes on every supported board.
* ``CUT_SECONDS`` / ``HEVC_CRF`` env knobs override defaults for
  iteration; the table in the doc lists what each clip exercises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(envelope,viewer): runtime Rock Pi 4 detection unlocks v4l2m2m HW decode

``bin/install.sh`` writes ``DEVICE_TYPE=arm64`` for every aarch64
SBC it doesn't recognise as a Pi — Rock Pi 4, Orange Pi, Allwinner
H6 boards, Amlogic S905 boards all share that one catch-all
DEVICE_TYPE. The matrix can't promote ``arm64`` to HEVC + HW
because most of those boards have no upstream-mpv HW decode path
and would log "Could not find a valid device" on every play.

But the Rock Pi 4 (RK3399 / Radxa) DOES have a working v4l2m2m
driver exposed by the kernel:

  $ docker exec anthias-anthias-viewer-1 mpv --hwdec=help | grep v4l2m2m
    v4l2m2m-copy (h264_v4l2m2m-v4l2m2m-copy)
    v4l2m2m-copy (hevc_v4l2m2m-v4l2m2m-copy)
    v4l2m2m-copy (vp9_v4l2m2m-v4l2m2m-copy)
    ...

and ``/dev/video-dec2`` / ``/dev/video-dec4`` are present (the
v4l2_request decoder symlinks). Leaving Rock Pi on SW decode for
1080p HEVC measurably wastes the silicon.

Resolved at runtime via ``/proc/device-tree/model``:

* New matrix key ``rockpi4`` → HEVC 1920×1080 30. 1080p ceiling
  keeps disk use of the variant + ``.original.<ext>`` sibling
  comfortable on the typical SD card; HEVC codec exercises the
  Hantro path on the way through the viewer.
* ``compute_envelope`` and ``_pi_hwdec_for_uri`` both probe the
  device tree when DEVICE_TYPE is ``arm64`` (or legacy
  ``generic-arm64``). A Rock Pi 4B reports
  ``Radxa ROCK Pi 4B`` and gets upgraded; an Orange Pi or an
  Allwinner H6 board stays on the conservative SW envelope.
* Failure modes (no device tree, decode error, unknown SBC) all
  collapse to ``None`` so dev containers and the existing arm64
  catch-all keep working unchanged.

Four new tests pin:
- Rock Pi model → ``rockpi4`` envelope;
- legacy ``generic-arm64`` label also gets the upgrade;
- unknown SBC keeps the conservative envelope;
- missing ``/proc/device-tree/model`` doesn't raise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(envelope,viewer): publish board subtype via host_agent + Redis

Previous commit (``dde1b20e``) added a runtime ``/proc/device-tree``
read inside the server + viewer containers. Containers don't see
that path by default, and mounting it into every container is
heavier than it's worth for one edge case (worse, balena's
restricted /proc would still trip).

``anthias_host_agent`` already runs on the host and publishes
host-side state to Redis (IP addresses, etc.). It's the right
layer for board identification:

* New ``detect_board_subtype()`` reads
  ``/proc/device-tree/model`` directly (host_agent IS on the
  host) and maps known SBC strings to matrix keys
  (Rock Pi 4A/4B/4C → ``rockpi4``).
* New ``set_board_subtype()`` publishes the resolved key (or the
  empty string for unknown boards) to ``host:board_subtype``
  before ``subscriber_loop`` flips ``host_agent_ready`` — so
  consumers can rely on the key being there once the readiness
  flag is set.
* Server's ``playback_envelope.compute_envelope`` and viewer's
  ``_pi_hwdec_for_uri`` read the same Redis key when DEVICE_TYPE
  is ``arm64`` / legacy ``generic-arm64``. Failure modes (Redis
  down, key missing, decode error) all collapse to ``None`` so
  the caller falls back to the conservative arm64 envelope.

No compose template changes. The viewer + server containers
already have Redis reachable (they use it for the Channels
layer + walker dispatch already), so the data path is free.

Unit tests pin:
* device-tree → subtype mapping for canonical + variant + edge
  Rock Pi strings, plus unknown boards;
* Redis publish writes the resolved key OR empty string;
* server's compute_envelope reads back through Redis correctly
  for known / unknown / empty / unreachable cases;
* subscriber_loop calls set_board_subtype before flipping
  ``host_agent_ready`` — race-free ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(celery): cap walker to --concurrency=1 so transcodes can't choke playback

Default celery worker concurrency = num_cores. On the boards
Anthias actually ships to (Pi 4 / Pi 5 / Rock Pi 4 / arm64
SBCs), that means up to 4 parallel ``libx265`` encodes sharing
the same SoC as the viewer's mpv process. ``nice -n 19`` +
``ionice -c 3`` are already in place, but nice(1) only helps
when there's CONTENTION -- four ffmpegs at nice 19 still
saturate every core, and each 1080p libx265 encode needs ~500 MB
RAM. A 4 GB SBC pushes into swap well before the walker
finishes, which stalls *everything* on the host -- live-
confirmed on the Rock Pi 4 during this PR: sshd starved through
banner exchange whenever the walker hit a fresh burst.

Asset processing is upload-time, not throughput-bound. The
operator-facing latency that matters is "upload click → asset
visible in rotation", which is bound by ONE encode regardless of
queue parallelism. Serial encodes finish a few minutes later in
wallclock but the viewer never drops a frame.

Applied to every prod / dev compose template. ``docker-compose.test.yml``
is left at default because the test suite never runs live
normalize tasks (the celery service in tests just exercises the
task dispatch plumbing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): force MPV on legacy ``generic-arm64`` DEVICE_TYPE

Rock Pi 4 running an older arm64 image reports
``DEVICE_TYPE=generic-arm64`` (pre-``refactor: rename device_type
generic-arm64 → arm64`` rebuilds). The MediaPlayerProxy
override only force-routed MPV for ``arm64`` / ``pi4-64``, so the
legacy label fell through to VLC -- which then crashed with
``NameError: no function 'libvlc_new'`` because the libvlc lib
isn't installed on the arm64 image. Live-confirmed in the viewer
crash loop on the Rock Pi 4 during this PR.

Adds ``'generic-arm64'`` to the force_mpv set + a test pinning
the dispatch. Covers the in-the-wild rolling-upgrade window
where a Rock Pi 4 deployment is sitting on an old image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(viewer): route ``generic-arm64`` through cage + ALSA-default like ``arm64``

Two more places in ``media_player.py`` only checked the post-rename
``arm64`` DEVICE_TYPE and missed the legacy ``generic-arm64`` label
the Rock Pi 4 test bed still reports:

* **VO dispatch** (line ~419) — without this, a generic-arm64 host
  falls through to the ``--vo=drm`` else branch, which mpv aborts
  with "No primary DRM device could be picked" because cage already
  holds DRM master in the cage + Wayland viewer stack
  (live-confirmed on the Rock Pi 4 in this PR).
* **ALSA card selection** (``get_alsa_audio_device``) — the Pi-name
  dispatch below the env-var check picks ``vc4hdmi`` / "Headphones"
  cards that don't exist on Rockchip / Allwinner / Amlogic. Without
  the legacy label here, mpv tries to open the Pi-specific HDMI
  card and dies with ``Unknown PCM sysdefault:CARD=vc4hdmi``.

Both branches now use the shared ``_ARM64_DEVICE_TYPES`` frozenset
that already governs the hwdec subtype probe, so the three paths
(envelope, hwdec dispatch, VO + ALSA) agree on what DEVICE_TYPE
labels are aarch64-catch-all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(envelope): Rock Pi 4 stays on H.264 1080p30 -- stock ffmpeg has no v4l2_request

Live testing on the Rock Pi 4 surfaced that the arm64 viewer
image's stock ffmpeg (Debian 7.1.3-0+deb13u1) is built without
``--enable-v4l2-request``, and the underlying kernel exposes the
RK3399's decoders only via the stateless v4l2_request API
(``rkvdec`` for HEVC, the Hantro block as ``rockchip,rk3399-vpu-dec``
for H.264). ffmpeg's stateful ``hevc_v4l2m2m`` / ``h264_v4l2m2m``
decoders can't reach them -- mpv logs ``Could not find a valid
device`` even after ``/dev/video-dec*`` symlinks are present.
mpv ``--hwdec=help`` also doesn't list rkmpp or drm-copy, so
there's no other path through the stock build.

So:

* ``rockpi4`` envelope drops from HEVC 1920x1080 30 to H.264
  1920x1080 30 -- the same conservative tier as the generic
  ``arm64`` catch-all. The viewer SW-decodes 1080p30 in real
  time on the Cortex-A72; no frames dropped, just no HW gain
  over plain ``arm64``.
* Rock Pi entry drops from ``_PI_HWDEC_BY_CODEC`` -- mpv falls
  through to ``auto-copy`` which mpv's whitelist resolves to
  SW decode on this build.
* host_agent's subtype publish, the start_viewer.sh
  ``/dev/video-dec*`` symlink creation, and the dedicated
  ``rockpi4`` matrix key all stay in place -- they're
  forward-compatible scaffolding so a follow-up enabling
  v4l2_request (or linking rkmpp) in the viewer build only has
  to bump the matrix entry's codec to ``hevc`` and add the
  hwdec dispatch row. No further plumbing churn.
* Tests + docs reflect the routing-without-HW reality.

The legacy-label fixes from this PR (force_mpv +
``--vo=gpu --gpu-context=wayland`` + ALSA default for the
``generic-arm64`` DEVICE_TYPE) are unaffected -- those are real
bug fixes the Rock Pi 4 needs to play *anything* under cage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(viewer,envelope): extend +rpt1 ffmpeg to arm64; Rock Pi 4 = HEVC 4Kp60

The Raspberry Pi APT repo's ffmpeg build (``+rpt1``) ships with
``--enable-v4l2-request --enable-libudev --enable-vout-drm``,
which the stock Debian Trixie ffmpeg drops. Without those flags
the v4l2_request hardware decoder family is unreachable from
mpv — which is exactly what bit the Rock Pi 4 in this PR:
RK3399's ``rkvdec`` (HEVC) and Hantro VPU (H.264) are both
stateless v4l2_request decoders. Pi 4 / Pi 5 already pull from
the +rpt1 repo for the same reason; extending the conditional in
``Dockerfile.viewer.j2`` to also include ``arm64`` lights up
hardware decode on every arm64 SBC whose kernel exposes
v4l2_request decoders (Rock Pi, Orange Pi RK356x, Pine64,
Allwinner H6 with Cedrus, ...).

* ``Dockerfile.viewer.j2`` — board conditional ``('pi4-64',
  'pi5')`` → ``('pi4-64', 'pi5', 'arm64')``. The apt pin already
  restricts the +rpt1 repo to ``ffmpeg + libav* + mpv``, so other
  arm64 packages stay on stock Debian. Comment block updated to
  list which decoders each board reaches via this path.
* ``playback_envelope.py`` — ``rockpi4`` envelope flips from
  H.264 1080p30 to HEVC 3840×2160 60. RK3399's Hantro G2 is the
  same decoder family as Pi 5's and supports 4Kp60 per the
  Rockchip datasheet — matching Pi 5's envelope keeps the fleet
  uniform.
* ``media_player.py`` — ``_PI_HWDEC_BY_CODEC['rockpi4']`` maps
  both h264 and hevc to ``drm-copy`` (the v4l2_request hwdec
  path, same as Pi 5 for HEVC).
* Tests + docs updated accordingly.

The legacy-arm64 fixes (force_mpv + cage VO + ALSA default for
``generic-arm64``) and the host_agent subtype publish are
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(celery): cgroup CPU hard cap (`cpus: 1.0`) so encodes never starve the viewer

``nice -n 19 ionice -c 3`` + ``--concurrency=1`` lower priority and
limit parallelism, but they're soft hints — when libx265 is the
only heavy workload on the box the scheduler still hands it
everything available. Live-confirmed on the Rock Pi 4 in this PR:
sshd starved through banner exchange and mpv dropped mid-frame
during walker bursts, even with all three soft caps in place.

``cpus: 1.0`` is a cgroup CFS quota — one CPU's worth of compute
per period, kernel-enforced. On every supported SBC (Pi 4 / Pi 5 /
Rock Pi 4, all 4-core) it leaves 3+ cores for the viewer, the
host_agent, sshd, and everything else. x86 hosts have 8+ cores so
the cap is conservative there but harmless — asset processing is
upload-time, not throughput-bound.

Applied to every prod / dev compose template. test compose stays
uncapped because the test suite runs in CI environments with
deterministic resources where the cap would just slow CI down
without protecting anything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(celery): scale CFS quota with host cores (half of \$(nproc), min 1.0)

A flat ``cpus: 1.0`` is too aggressive: it forces a single-thread
ceiling even when the host has many idle cores. On an 8-core x86
deployment the asset processor would take 4x longer than it needs
to without protecting anything we don't already protect.

Compute the limit dynamically in ``bin/upgrade_containers.sh``:
``$(nproc) * 0.5`` (floored to 1.0 so single-core hosts still
make progress). On the supported boards this lands at:

  * 4-core Pi 4 / Pi 5 / Rock Pi 4 → cpus: 2.0 (2 cores headroom
    for the viewer + system)
  * 8-core x86 → cpus: 4.0 (4 cores headroom)
  * 16-core x86 → cpus: 8.0 (still 50/50 with the system)

Soft priorities (``nice -n 19 ionice -c 3``) and the
``--concurrency=1`` walker still apply on top; the cgroup quota
is the hard backstop that guarantees "encoding never impacts
playback or UI access". Live test on the Rock Pi 4 (in this PR)
proved the soft caps alone aren't enough — libx265 saturated
every core and starved sshd through banner exchange.

The balena compose templates use a literal ``cpus: 2.0`` (balena
only targets 4-core Pi 2/3/4/5 today); the non-balena prod
compose substitutes the env var. Dev compose also uses a literal
``2.0`` since dev hosts vary too widely to autodetect cheaply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(walker): hardware-decode the source in the transcode pipeline

The walker's encode pass stays libx265-software-bound on every
SBC (none of Pi 4 / Pi 5 / Rock Pi 4 have HEVC HW encode), but
the *decode* half of the pipeline can be offloaded to the same
silicon mpv uses for playback. That's typically 30-50% of the
ffmpeg wall-clock on H.264 sources and dominant on 4K — well
worth the small dispatch table.

* ``_decode_hwaccel_args(source_codec)`` returns the per-board
  ``-hwaccel`` flags to prepend to the ffmpeg invocation. Uses
  the same host_agent subtype probe (``host:board_subtype`` in
  Redis) that envelope resolution already uses, so the walker
  and viewer agree on what board they're targeting.
* Dispatch matrix:
  - Pi 4 (V3D V4L2 M2M + rpi-hevc-dec) → ``-hwaccel drm`` for
    both H.264 and HEVC (the +rpt1 ffmpeg's v4l2_request path).
  - Pi 5 (Hantro G2) → ``-hwaccel drm`` for HEVC only.
  - Rock Pi 4 (rkvdec + Hantro VPU) → ``-hwaccel drm`` for both,
    same v4l2_request path as Pi 5.
  - x86 (VAAPI) → ``-hwaccel vaapi -hwaccel_device
    /dev/dri/renderD128`` for both.
  - Pi 2 / Pi 3 / unknown arm64 → no HW path mpv can address;
    SW decode is the only choice.
* ``_transcode_to_target`` wraps the ffmpeg call: first attempt
  with hwaccel args, fall back to SW decode on
  ``sh.ErrorReturnCode`` (kernel driver weird, device busy,
  bitstream the v4l2_request decoder rejects). Logs the
  underlying ffmpeg stderr at WARNING so an operator chasing a
  slow walker sees the HW path failed.

Tests pin every cell of the dispatch matrix + assert ``-hwaccel``
lands BEFORE ``-i`` in the argv (placing it after silently
no-ops in ffmpeg) + the two-call SW-fallback path on simulated
HW init failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server-image): extend +rpt1 ffmpeg pin to anthias-server too

The walker's HW-decode optimization (``processing._decode_hwaccel_args``
emits ``-hwaccel drm``) only works against the Raspberry Pi repo's
``+rpt1`` ffmpeg build, which has ``--enable-v4l2-request``. The
pin was previously only on the *viewer* image (Dockerfile.viewer.j2
in ``ba8d4709``), so the celery container — which runs the walker —
kept the stock Debian ffmpeg and the hwaccel call silently fell
back to SW on every board.

* New ``docker/_rpt1-ffmpeg-pin.j2`` extracts the pin block.
* Both ``Dockerfile.viewer.j2`` and ``Dockerfile.server.j2`` now
  include it via ``{% include '_rpt1-ffmpeg-pin.j2' %}``. Server
  also re-runs ``apt install --reinstall ffmpeg libav*`` so the
  pinned version replaces whatever the base layer installed.
* No effect on Pi 2 / Pi 3 / x86 boards — the include's
  ``{% if board in ('pi4-64', 'pi5', 'arm64') %}`` keeps it
  inert there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(celery,viewer): four hardening fixes so the player survives an upgrade

Live testing on Pi 4 / Pi 5 / Rock Pi 4 surfaced four scenarios
where a single ``docker compose pull && up -d`` (or any upgrade
that invalidates the playback envelope) wedges the device. These
aren't test-harness flakes; production operators on the same
hardware would hit them. All four belong in this PR alongside the
features that exposed them.

1. **Walker drip-feed** — ``regenerate_for_envelope_change``
   previously queued every stale ``normalize_video_asset`` in one
   beat tick. ``--concurrency=1`` serialises *execution* but the
   celery worker fetches the next task the instant the previous
   finishes, so a 100-asset catalog turns into hours of back-to-
   back libx265 with zero recovery windows between encodes.
   Switch to ``apply_async(args=..., countdown=N * 60)`` so
   each subsequent normalize starts at least 60 s after the
   previous was queued. Operator can flip ``is_processing=False``
   on a row mid-window to cancel its turn.
2. **``mem_limit`` on celery container** — cgroup CPU isolation
   alone doesn't stop libx265-4K from allocating ~1.5 GB resident
   memory, which on a 4 GB SBC pushes the system into swap and
   starves sshd + the viewer. Match the cpus cap with a memory
   cap (60% of host RAM, computed in ``bin/upgrade_containers.sh``).
3. **``stop_grace_period: 3s`` + ``stop_signal: SIGKILL`` on
   viewer** — cage doesn't reliably release DRM master on
   SIGTERM (its libinput shutdown path hangs on certain kernels)
   and the kernel's GPU driver leaves dangling references that
   prevent the next ``up`` from acquiring DRM master. Skipping the
   SIGTERM-then-wait dance on intentional restarts gets the
   device past cage's bug deterministically.
4. **libx265 / libx264 ``-preset superfast``** — was ``medium``.
   Asset processing is upload-time and only runs once per asset,
   so the 5-10× wallclock speedup is operator-facing throughput.
   The ~10-20% bitrate increase is invisible on typical signage
   content. Viewer decode is HW regardless of preset.

Tests:
* Walker test mocks switched from ``.delay`` to ``.apply_async``;
  signatures updated for ``args=(...,)`` + ``countdown=`` kwarg.
* New ``test_regenerate_walker_spaces_dispatches_via_countdown``
  asserts the countdowns are ``[0, 60, 120, ...]`` across a
  5-asset catalog so the drip-feed contract is pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): use sh.ErrorReturnCode_1 in hwaccel fallback test

sh.ErrorReturnCode is the abstract base; its __init__ does
`self.exit_code = self.exit_code` which AttributeErrors unless the
concrete numeric subclass (ErrorReturnCode_1, _2, ...) is used. Every
other call site in this file already uses ErrorReturnCode_1 — this was
the lone outlier introduced with the SW-fallback test in 0340b4f4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(asset-processor): drop on-device video transcoding

On-device libx265 transcode wedged a Pi 4's celery worker for 99 min on a
single 4K60 H.264→HEVC pass during PR validation. Every supported board
already HW-decodes both H.264 and HEVC via the viewer's per-board mpv
hwdec dispatch (drm-copy / vaapi-copy / v4l2m2m-copy), so the re-encode
provided no playback benefit for the codecs operators actually upload.

- ``normalize_video_asset`` now runs ffprobe and writes codec / dims /
  fps / duration into ``metadata``; the asset file is never rewritten.
- Removes the envelope module, the re-render walker
  (``regenerate_for_envelop…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants